Revealing clinically relevant specific IgE sensitization patterns in Hymenoptera venom allergy with dimension reduction and clustering

Background Immunoglobulin E (IgE) blood tests are used to detect sensitizations and potential allergies. Recent studies suggest that specific IgE sensitization patterns due to molecular interactions affect an individual's risk of developing allergic symptoms. Objective The aim of this study was to reveal specific IgE sensitization patterns and investigate their clinical implications in Hymenoptera venom allergy. Methods In this cross-sectional study, 257 hunters or fishers with self-filled surveys on previous Hymenoptera stings were analyzed. Blood samples were taken to determine Hymenoptera IgE sensitization levels. Using dimensionality reduction and clustering, specific IgE for 10 Hymenoptera venom allergens were evaluated for clinical relevance. Results Three clusters were unmasked using novel dimensionality reduction and clustering methods solely based on specific IgE levels to Hymenoptera venom allergens. These clusters show different characteristics regarding previous systemic reactions to Hymenoptera stings. Conclusion Our study was able to unmask non-linear sensitization patterns for specific IgE tests in Hymenoptera venom allergy. We were able to derive risk clusters for anaphylactic reactions following hymenoptera stings and pinpoint relevant allergens (rApi m 10, rVes v 1, whole bee, and wasp venom) for clustering.


INTRODUCTION
Allergic diseases affect as much as 30% of world's population and represent a major socioeconomic burden. 1,2Despite low mortality, allergies may lead to significant reduction in quality of life. 3Symptoms range from persistent nasal obstruction and eye watering, as seen in tree pollen allergies, to severe anaphylactic reactions and death, as observed in peanut, Hymenoptera venom, and drug allergies.According to recent studies, the prevalence of allergic diseases is globally 4,5 on the rise.Diagnostics for IgE-mediated allergies, like Hymenoptera venom allergies, involve medical history assessment, skin tests, and in-vitro blood tests such as specific IgE measurements. 6Yet, the quantitative interpretation of specific IgE levels and their clinical relevance remains an area of contention. 7The current cut-off values for determining sensitivity are widely debated, and while some progress has been made in certain allergy diagnostics, 8,9 there remain challenges in others like drug hypersensitivity 10 and specific food allergies. 113][14][15][16] These studies suggest, if at all, a complex, non-strictly linear interaction pattern of specific IgE levels and indicate varied results across allergic diagnostics.
Cross-reactive carbohydrate determinants (CCDs) present in venom extracts can complicate the identification of primary Hymenoptera allergies.To address this, molecular or component-resolved diagnosis (CRD) 17,18 enable the production of recombinant allergen proteins free of CCDs.For Hymenoptera venom allergies, CRD isolates specific allergens, including Api m 2, 3, 4, 10, Ves v 1, and Ves v 5, establishing them as markers for primary honey bee or wasp venom allergies.In contrast, the cross-reactivity between Api m 5 and Ves v 3 hinders their use as reliable marker allergens.Distinguishing primary allergies from crossreactivity is crucial in selecting the appropriate immunotherapy.In addition, a dominant Api m 10 sensitization is a marker for increased risk of immunotherapy failure. 19w evolving mathematical models, such as UMAP (Uniform Manifold Approximation and Projection), 20 are revolutionizing the field by offering more accurate, lower-dimensional representations of higher-dimensional datasets.This stands in contrast to traditional network analysis, which often relies on less precise layouts. 21Specifically, UMAP is gaining traction due to its capability to better preserve both local and global structures within high dimensional data, offering advantages over traditional methods like t-Distributed Stochastic Neighbor Embedding (t-SNE) 20 or principal component analysis (PCA). 22 the other hand, HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) 23 stands out as an advanced clustering algorithm.Unlike traditional clustering methods (e.g., k-means clustering or latent class analysis 24 ) that may require specifying the number of clusters beforehand, HDBSCAN operates by identifying clusters of varying densities.When UMAP and HDBSCAN are combined, they offer an effective approach to identify and visualize clusters in reduced-dimensional space.
In this study, we applied current scalable, stateof-the-art mathematical methods to allergology by combining a dimension reduction method with a clustering algorithm to assess a) if Hymenoptera

Key messages
UMAP dimension reduction and HDBSCAN clustering can be used synergically to derive clusters from quantitative specific IgE sensitization levels to Hymenoptera venom allergens

Cluster association has clinical implications on previous systemic reactions to Hymenoptera stings
We suggest a more resource-efficient use of Hymenoptera specific IgE in-vitro tests (rApi m10, rVes v1, whole honey bee, and wasp venom) specific IgE sensitization patterns are unmasking visually in the lower dimensional space, b) the implications for laboratory measurements and clinical application of these methods, and c) if these methods can help to optimize the allocation of resources and suggest a more cost-efficient use of laboratory tests.

Study design
In a cross-sectional study conducted in the Greater Munich Area, Southern Germany, during winter 2016 and January 2017, 257 adult individuals were recruited and analyzed from annual winter meetings of three hunting associations and an international hunting and fishing exhibition. 25pon providing written informed consent prior to study inclusion, participants filled out a selfreported questionnaire capturing general demographics, hunting/fishing activities, known allergies, and experiences with Hymenoptera stings.The questionnaire assessed reactions to insect stings based on local reactions and questions according to the European Grading of Anaphylactic Symptoms 26,27 (anaphylaxis grade 1 to grade 4).Blood samples were subsequently collected and stored for later in vitro testing.The ImmunoCAPÔ system measured total and specific IgE levels, along with serum tryptase. 25riables, data sources, measurement Our primary independent variables were the subject age, total IgE (kU/l), tryptase (mg/l), and each subject's individual subset of measured specific IgE levels (kUA/l) to 10 different Hymenoptera allergens (rApi m 1, rApi m 2, rApi m 3, rApi m 5, rApi m 10, rVes v 1, rVes v 5, and MUXF3, a CCD marker molecule) using ImmunoCAPÔ Allergy Testing Solutions and PhadiaÔ Laboratory System by Thermo Fisher Scientific.

Dimension reduction
Dimension reduction was solely performed using specific IgE values.Uniform Manifold Approximation and Projection (UMAP) 28 was performed (Fig. 1).The result was a two-dimensional representation of the higher dimensional specific IgE measurements with preserved local and global structure. 20ustering A high-performance implementation 23 of Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) 29 was utilized, using unsupervised learning to find clusters (¼ dense regions) in the embeddings (Fig. 1).After visual exploration of the lowerdimensional representation and the cluster hierarchy dendrogram (Fig. 2þ3, B), we set the parameters to reach the best fit of the projected dense regions. Weused Manhattan as a distance metric for higher-dimensional. 30A dot in the graph represents an individual subject.The subjects are either colored by cluster association or other attributes as stated (Figs. 2 and 3).For reproducibility, a random seed of 42 was set.

Statistical analysis
Data were analyzed using Python 3.8.8 with statistical libraries pandas 1.3.0,SciPy 1.5.2,statsmodels 0.12.0, umap 0.5.1 and hdbscan 0.8.26.In addition to descriptive parameters, we report all results as mean with 95% confidence intervals.Specific IgE graphs were log-transformed for better comparability.Logistic regression was performed Fig. 1 Dimension reduction and clustering algorithm.A step-by-step study protocol for clustering higher dimensional data after applying dimensionality reduction for binary outcome variables using Broyden-Fletcher-Goldfarb-Shanno algorithm (BFGS) as an optimizer, and odds ratios with 95% confidence intervals and p-values are shown.

RESULTS
Cluster analysis of the hunter/Fisher group UMAP dimension reduction followed by HDBSCAN with the help of the cluster hierarchy dendrogram revealed 3 principal clusters derived from specific IgE sensitizations to Hymenoptera venom allergens in the hunter/fisher group (Fig. 2,  A-C).Subjects not affiliated with any of the 3 clusters (n ¼ 2, 0.78%) or who did not answer the question on previous Hymenoptera stings (n ¼ 12, 4.7%) were excluded from the further analysis.

Specific IgE characteristics of whole venom extracts
The analysis of whole venom extracts for honey bee (i 1) and wasp (i 3) revealed a different focus for every cluster.While specific IgE for honey bee (i 1) was only different in clusters 0 and 1 (mean difference: 2.23 AE 3.13, p ¼ 0.003), wasp (i 3) was significantly different between clusters 0 and 2 (mean difference: 2.22 AE 2.18, p ¼ 0.001) as well as between clusters 1 and 2 (mean difference: 1.78 AE 2.26, p ¼ 0.001).

Specific IgE characteristics of Hymenoptera recombinants
Specific IgE characteristics of Hymenoptera recombinants were analyzed across 3 distinct clusters.The variations in specific IgE levels between clusters for the tested recombinants are summarized in the table below (Table 1).Specific IgE values for r Api m 1, 2, 3, and 10 where highest in cluster 1 compared to the other 2 clusters.In contrast, the specific IgE values for r Ves v 1 and 5 had their peak values in cluster 2. Notably, rApi m 5 and CCD MUXF3 showed no significant variations in specific IgE levels across the clusters.When considering the mean specific IgE values for all recombinants, values were notably lower in cluster 0 compared to clusters 1 and 2, with no significant difference between clusters 1 and 2. In summary, cluster 0 was characterized by peaks in specific IgE for honey bee (i 1), the wasp recombinant rVes v 1, and low levels of CCD MUXF3.Cluster 1 showed the highest average specific IgE to all tested allergens, with a peak in specific IgE for honey bee (i 1).Cluster 2 primarily consisted of high peaks in specific IgE for whole wasp venom (i 3) followed by the wasp recombinants rVes v 1 and rVes v 5 (Fig. 2, A).

Clinical manifestation of Hymenoptera venom allergies
In the previous section, we elaborated on the implications of cluster affiliations on Hymenoptera specific IgE sensitization patterns.When cluster association on previous systemic reactions following Hymenoptera stings was evaluated, we found an increasing risk from cluster 0, (23.1%) to cluster 2 (38.5%), and to cluster 1 (  All other Hymenoptera specific IgE levels (wasp (i 3), rApi m 2, 3, 5, rVes v 1, 5 and MUXF3) showed no predictive benefit.In the case of at least moderate systemic reactions to Hymenoptera stings, the only predictor was the cluster association (OR: 0.17 [0.09, 0.32], p < 0.001, Table 2, Fig. 2), which is reflected in the increased number of at least moderate systemic reactions from the lower-risk cluster 0 (1.3%) over the intermediate risk cluster 2 (9.4%) to the highest-risk cluster 1 (17.4%).
These data suggest that clusters derived from quantitative Hymenoptera-specific IgE levels are not only a predictor, among others, for previous systemic reactions but also the only predictor for previous moderate to severe systemic reactions.

DISCUSSION
The role of quantitative specific IgE in vitro tests on allergy clinical manifestations has been a matter of prolonged, scientific debate. 31,32The findings in our study are in agreement with previous studies on the mixed importance of Hymenoptera recombinants. 33,34 More accurate dimensionality reduction methods like UMAP 28 have already been successfully applied to medical science and used for identifying cell types and trajectories in embryology 35 as well as the exploration of finer structures in cellular development. 36Novel clustering algorithms, like the recently developed HDBSCAN, implemented as both a hierarchal and density-based clustering algorithm 23 has previously only been observed outside of the medical field, e.g., in chemistry to uncover large-scale conformational change in molecules. 37  Table 3. Binary logistic regression for anaphylactic reactions in the hunter/fisher group.Results of the binary logistic regression model of the given independent variables (more than 5 stings, cluster association and specific IgE levels to Hymenoptera venom recombinants) as predictors for (A) at least a mild systemic reaction or (B) at least a moderate systemic reaction following a Hymenoptera sting.Clusters were derived from the whole venom extracts as well as rVes v 1 and rApi m 10.Both cluster association and rApi m 10 sensitization levels showed a predictive value for a systemic reaction following Hymenoptera stings.For moderate to severe systemic reactions, cluster association was the only variable providing a predictive value in our model sensitization patterns with a different approach like latent class analysis (LCA) has successfully been conducted in asthmatic patients. 38As seen with the k-nearest neighbors method (KNN) for clustering, a downside of LCA though is the number of groups that need to be determined before analysis and that the dependent variables need to be categorical.The methodology around network analysis revealed previously unknown disease entities in the field of psychopathology 39 as well as disease characteristics in asthmatic patients following sensitization clustering. 40,41Unlike UMAP, however, network analysis commonly uses force directed layouts, 42 which distort the true nature of higher dimensional datasets. 21e risk assessment of more than 5 previous Hymenoptera stings shown in the previous study on the hunter/fisher cohort was canceled out by the higher predictive value of the cluster association in our model.The authors of the previous study were "unable to find a correlation between reaction severity and sensitization to any of the hitherto available recombinant allergens". 25Our study suggests the necessity to assess the specific IgE levels in a non-linear wholesome approach.By using UMAP dimensionality reduction and HDBSCAN clustering as well as by including all Hymenoptera venom allergens quantitatively, cluster associations, and number of previous stings in a single logistic regression model with a Bayesian optimization method, we were able to show that the associated clusters are the only means to predict previous moderate to severe systemic reactions.We also demonstrated a distinction between subjects with and without systemic reactions following Hymenoptera stings.
To our knowledge, the presented data are the first to apply modern dimension reduction and clustering methods to the field of Hymenoptera venom allergies and show a diagnostic benefit.We believe that complex interactions in IgE-mediated allergies are reflected in different specific IgE sensitization patterns, which cannot be assessed by classic, linear statistics.In line with these findings, other fields in allergology related to IgEmediated type I allergies, like asthmatic diseases, showed the benefit of a wholesome analysis of more than a single allergen.We present evidence that there is no need for individual cutoff-values for every single specific IgE antibody in Hymenoptera venom allergies, as the raw quantification of specific IgE antibodies can be used in clustering algorithms to subsequently derive meaningful clinical implications.
In other fields, the discovery of new recombinant allergens improved diagnosis and risk stratification based on an individual single allergen sensitization.Advances have been shown for recombinant allergens like rPru p 3 43,44 in peach allergies and rAra h 1, 2, 3 in peanut allergies. 45,46Milk and egg allergies, 17 allergic rhinitis, and asthma have also profited from new recombinants. 479][50] Particularly rApi m 10 was shown to be a risk factor for treatment failure of honey bee venom allergies 51 and an indicator for a primary sensitization to honey bee venoms. 52Other Hymenoptera recombinants showed only questionable benefit for allergy diagnostics. 33,53We demonstrated that the usefulness of rApi m 10 can be extended as a quantitative input in a wholesome assessment with other allergens using dimensionality reduction and clustering methods.Similar behavior was observed for peach allergies, where different co-sensitizations between rPru p 3, p 1 and p 4 potentially enhanced risk assessment. 44 line with the previous studies, not all recombinants appear to be of equal value in Hymenoptera venom allergies.While rApi m 10 was a good single-allergen predictor as previously known, all other recombinants (rApi m 1, 2, 3, 5, rVes v 1 and 5) did not provide a clear benefit for clinical interpretation.While Api m10 stands out in bee allergy significance, we still must consider the other recombinants at lower levels, as low-titer sensitizations can also cause severe clinical reactions 54 and indicate the need for immunotherapy.Sole reliance on predominant allergens like Api m10 could be misleading.
True dual sensitization to both bee and wasp venom differs significantly from mere crossreactivity, such as specific IgE against CCDs without elevated anaphylactic reaction risk.Crossreactivity between venoms often results from shared allergenic components, like CCDs.In contrast, true dual sensitization implies independent sensitizations to each insect's distinct allergens.This differentiation is crucial for diagnosis, management, and therapeutic strategies, including double venom immunotherapies.Certain studies suggest the basophil activation test for differentiating true dual sensitization. 55e abundance of available specific IgE recombinants in Hymenoptera venom allergies necessitates responsible and economical use.As the previous results suggest, redundancy and clinically irrelevant sensitizations exist when testing the whole Hymenoptera panel.We attempted to replicate the results with a more cost-efficient use of the available recombinants (honey bee [i 1], wasp [i 3], rApi m 10, and rVes v 1).We were able to retain most clustering characteristics with only 40% of the allergens.
Our study has several limitations that must be acknowledged.First, due to the constrained sample size, our findings are primarily based on internal validation, with external testing and predictions not conducted.Second, the specificity of our cohortcomprising largely of hunters and fishersrestricts the general applicability of our results.This group has a heightened exposure to Hymenoptera stings, which might not mirror the broader population's experience.Third, the dendrogram-based clustering, while insightful, is not without challenges; it is not a fully automated method and demands manual parameter calibration by researchers.It does offer the advantage of not requiring pre-set cluster numbers, unlike some other clustering methodologies.Despite these challenges, our study offers valuable perspectives into Hymenoptera venom allergies, aiding risk assessment and suggesting ways to enhance the efficiency of in-vitro specific IgE testing.

Fig. 2
Fig. 2 Cluster analysis through specific IgE sensitization patterns to Hymenoptera venom allergens (n ¼ 257).Mean specific IgE values with 95% confidence intervals for the different Hymenoptera venom allergens in the three different clusters in the hunter/fisher group (A).Two unclustered subjects were omitted.Clustering parameters were chosen with the help of the cluster hierarchy dendrogram (B).The lower dimensional representations, also called embeddings, are visualized and color-coded by associated cluster (C), a subject's previous systemic reaction, (D) and a subject's previous moderate to severe systemic reaction following Hymenoptera stings (E)

Fig. 3
Fig. 3 Cluster analysis through specific IgE sensitization patterns to Hymenoptera venom allergens (n ¼ 257).Mean specific IgE values with 95% confidence intervals for the different Hymenoptera venom allergens in the three different clusters in the hunter/fisher group.Only both whole venom extracts honey bee (i 1), wasp (i 3), and the recombinants rVes v 1 and rApi m 10 (circled) were used in the dimension reduction and clustering analysis (A).Clustering parameters were chosen with the help of the cluster hierarchy dendrogram (B).The lower dimensional representations, also called embeddings, are visualized and color-coded by associated cluster (C), a subject's previous systemic reaction, (D) and a subject's previous moderate to severe systemic reaction following Hymenoptera stings (E)

Table 1 .
However, our study for the first time describes the introduction of dimensionality reduction paired with clustering algorithms to derive risk clusters, which have meaningful Comparative IgE Sensitization across clusters.The table displays the mean and 95% confidence intervals of specific IgE levels for various Hymenoptera venom recombinants across three distinct clusters (Cluster 0, 1, and 2).Significant differences between the clusters with their respective p values are derived from a Tukey's HSD test after a one-way ANOVA.Allergens with no significant variations across clusters (p > 0.05) are shaded in grey.Specific IgE levels for a given cluster that demonstrates higher levels compared to the Furthermore, training machine learning algorithms to tackle the complex nature of specific IgE sensitization usually requires large amounts of data, whereas dimensionality reduction approaches can provide deterministic predictions on a far smaller scale without overfitting.The assessment of

Table 2 .
Binary logistic regression for anaphylactic reactions in the hunter/fisher group.Results of the binary logistic regression model of the given independent variables (more than 5 stings, cluster association and specific IgE levels to Hymenoptera venom recombinants) as predictors for (A) at least a mild systemic reaction or (B) at least a moderate systemic reaction following a Hymenoptera sting.Clusters were derived from the complete Hymenoptera panel.Cluster association as well was the levels of honey bee (i 1), rApi m 1 and rApi m 10 specific IgE sensitization showed a predictive value for systemic reactions to Hymenoptera stings.For moderate to severe systemic reactions following Hymenoptera stings, cluster association was the only variable which provided a predictive value in our model